Skip to content

Implement support for structured and struct zarr-extension defined dtypes#3781

Open
BrianMichell wants to merge 12 commits intozarr-developers:mainfrom
BrianMichell:implement_canonical_structs
Open

Implement support for structured and struct zarr-extension defined dtypes#3781
BrianMichell wants to merge 12 commits intozarr-developers:mainfrom
BrianMichell:implement_canonical_structs

Conversation

@BrianMichell
Copy link
Copy Markdown
Contributor

@BrianMichell BrianMichell commented Mar 16, 2026

Updates for consistency with the zarr-extension for structured types.

Implements legacy read-only for structured type and support for new struct definition.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@d-v-b
Copy link
Copy Markdown
Contributor

d-v-b commented Mar 16, 2026

this looks great, thanks for this. Instead of modifying the existing structured data type, can we make struct as a subclass of that data type, and register it? That means people who need the legacy dtype can still use it (by overriding the registry)

@BrianMichell
Copy link
Copy Markdown
Contributor Author

this looks great, thanks for this. Instead of modifying the existing structured data type, can we make struct as a subclass of that data type, and register it? That means people who need the legacy dtype can still use it (by overriding the registry)

Should be good to go now!

@BrianMichell
Copy link
Copy Markdown
Contributor Author

@d-v-b Is there anything I can do to move this PR forward?

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

❌ Patch coverage is 93.10345% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.96%. Comparing base (c966cfa) to head (e7da243).

Files with missing lines Patch % Lines
src/zarr/codecs/bytes.py 60.00% 4 Missing ⚠️
src/zarr/core/dtype/npy/structured.py 97.29% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3781      +/-   ##
==========================================
- Coverage   92.98%   92.96%   -0.02%     
==========================================
  Files          87       87              
  Lines       11246    11316      +70     
==========================================
+ Hits        10457    10520      +63     
- Misses        789      796       +7     
Files with missing lines Coverage Δ
src/zarr/core/array.py 97.64% <100.00%> (ø)
src/zarr/core/dtype/__init__.py 100.00% <100.00%> (ø)
src/zarr/dtype.py 100.00% <ø> (ø)
src/zarr/core/dtype/npy/structured.py 97.40% <97.29%> (-1.54%) ⬇️
src/zarr/codecs/bytes.py 92.75% <60.00%> (-5.58%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b
Copy link
Copy Markdown
Contributor

d-v-b commented Apr 1, 2026

@d-v-b Is there anything I can do to move this PR forward?

hi @BrianMichell thanks for the ping, I will get a review in soon. since numpy is handling the actual data type, I think we will ask much less of you than the tensorstore devs :)

raise DataTypeValidationError(
f"Invalid data type: {dtype}. Expected an instance of {cls.dtype_cls}"
)
raise DataTypeValidationError(f"Use 'Struct' for native dtype matching. Got: {dtype}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to modify the Structured data type at all? I would think we can keep it exactly as-is, but do not register it in the data type registry. Users who need backwards compatibility can explicitly unregister the new data type, and register the old one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this design choice to support transparent r+ of Structured dtype and also r+/w for Struct. This implementation should support my usecase where we have petabyte scale Structured legacy data and expect to move forward with generating new data using Struct without requiring either a remaster or client code to register/unregister on the fly.

Copy link
Copy Markdown
Contributor

@d-v-b d-v-b Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we remove Structured from the registry, and Struct accepts data_type metadata that would have worked for Structured, then doesn't this achieve your goal? The idea is that the input type for Struct is wider than the input type for Structured, so any valid Structured would also be a valid Struct data type. But if for some reason someone needs to use the old Structured data type, it is still there, untouched. They just have to opt in to using it.

edit, I mixed up Struct and Structured

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants